26 research outputs found

    Estimating Photometric Redshifts of Quasars via K-nearest Neighbor Approach Based on Large Survey Databases

    Full text link
    We apply one of lazy learning methods named k-nearest neighbor algorithm (kNN) to estimate the photometric redshifts of quasars, based on various datasets from the Sloan Digital Sky Survey (SDSS), UKIRT Infrared Deep Sky Survey (UKIDSS) and Wide-field Infrared Survey Explorer (WISE) (the SDSS sample, the SDSS-UKIDSS sample, the SDSS-WISE sample and the SDSS-UKIDSS-WISE sample). The influence of the k value and different input patterns on the performance of kNN is discussed. kNN arrives at the best performance when k is different with a special input pattern for a special dataset. The best result belongs to the SDSS-UKIDSS-WISE sample. The experimental results show that generally the more information from more bands, the better performance of photometric redshift estimation with kNN. The results also demonstrate that kNN using multiband data can effectively solve the catastrophic failure of photometric redshift estimation, which is met by many machine learning methods. By comparing the performance of various methods for photometric redshift estimation of quasars, kNN based on KD-Tree shows its superiority with the best accuracy for our case.Comment: 28 pages, 4 figures, 3 tables, accepted for publication in A

    Memory-Gated Recurrent Networks

    Full text link
    The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.Comment: This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21

    The Causal Learning of Retail Delinquency

    Full text link
    This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.Comment: This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21

    SDSS quasars in the WISE preliminary data release and quasar candidate selection with optical/infrared colors

    Full text link
    We present a catalog of 37,842 quasars in the SDSS Data Release 7, which have counterparts within 6" in the WISE Preliminary Data Release. The overall WISE detection rate of the SDSS quasars is 86.7%, and it decreases to less than 50.0% when the quasar magnitude is fainter than i=20.5i=20.5. We derive the median color-redshift relations based on this SDSS-WISE quasar sample and apply them to estimate the photometric redshifts of the SDSS-WISE quasars. We find that by adding the WISE W1- and W2-band data to the SDSS photometry we can increase the photometric redshift reliability, defined as the percentage of sources with the photometric and spectroscopic redshift difference less than 0.2, from 70.3% to 77.2%. We also obtain the samples of WISE-detected normal and late-type stars with SDSS spectroscopy, and present a criterion in the zW1z-W1 versus gzg-z color-color diagram, zW1>0.66(gz)+2.01z-W1>0.66(g-z)+2.01, to separate quasars from stars. With this criterion we can recover 98.6% of 3089 radio-detected SDSS-WISE quasars with redshifts less than four and overcome the difficulty in selecting quasars with redshifts between 2.2 and 3 from SDSS photometric data alone. We also suggest another criterion involving the WISE color only, W1W2>0.57W1-W2>0.57, to efficiently separate quasars with redshifts less than 3.2 from stars. In addition, we compile a catalog of 5614 SDSS quasars detected by both WISE and UKIDSS surveys and present their color-redshift relations in the optical and infrared bands. By using the SDSS ugrizugriz, UKIDSS YJHK and WISE W1- and W2-band photometric data, we can efficiently select quasar candidates and increase the photometric redshift reliability up to 87.0%. We discuss the implications of our results on the future quasar surveys. An updated SDSS-WISE quasar catalog consisting of 101,853 quasars with the recently released WISE all-sky data is also provided.Comment: 27 pages, 9 figures and 5 tables. Revised to match the published version in the Astronomical Journal. 5 tables are available electronically at (http://vega.bac.pku.edu.cn/~wuxb/sdsswiseqso.htm). A new SDSS-WISE quasar catalog consisting of 101,853 quasars with the WISE all-sky data is available as Table

    Selecting Quasar Candidates by a SVM Classification System

    Full text link
    We develop and demonstrate a classification system constituted by several Support Vector Machines (SVM) classifiers, which can be applied to select quasar candidates from large sky survey projects, such as SDSS, UKIDSS, GALEX. How to construct this SVM classification system is presented in detail. When the SVM classification system works on the test set to predict quasar candidates, it acquires the efficiency of 93.21% and the completeness of 97.49%. In order to further prove the reliability and feasibility of this system, two chunks are randomly chosen to compare its performance with that of the XDQSO method used for SDSS-III's BOSS. The experimental results show that the high faction of overlap exists between the quasar candidates selected by this system and those extracted by the XDQSO technique in the dereddened i-band magnitude range between 17.75 and 22.45, especially in the interval of dereddened i-band magnitude < 20.0. In the two test areas, 57.38% and 87.15% of the quasar candidates predicted by the system are also targeted by the XDQSO method. Similarly, the prediction of subcategories of quasars according to redshift achieves a high level of overlap with these two approaches. Depending on the effectiveness of this system, the SVM classification system can be used to create the input catalog of quasars for the GuoShouJing Telescope (LAMOST) or other spectroscopic sky survey projects. In order to get higher confidence of quasar candidates, cross-result from the candidates selected by this SVM system with that by XDQSO method is applicable.Comment: 11 pages, 4 figures and 7 tables, MNRAS accepte

    Screening Genes Promoting Exit from Naive Pluripotency Based on Genome-Scale CRISPR-Cas9 Knockout

    No full text
    Two of the main problems of stem cell and regenerative medicine are the exit of pluripotency and differentiation to functional cells or tissues. The answer to these two problems holds great value in the clinical translation of stem cell as well as regenerative medicine research. Although piling researches have revealed the truth about pluripotency maintenance, the mechanisms underlying pluripotent cell self-renewal, proliferation, and differentiation into specific cell lineages or tissues are yet to be defined. To this end, we took full advantage of a novel technology, namely, the genome-scale CRISPR-Cas9 knockout (GeCKO). As an effective way of introducing targeted loss-of-function mutations at specific sites in the genome, GeCKO is able to screen in an unbiased manner for key genes that promote exit from pluripotency in mouse embryonic stem cells (mESCs) for the first time. In this study, we successfully established a model based on GeCKO to screen the key genes in pluripotency withdrawal. Our strategies included lentiviral package and infection technology, lenti-Cas9 gene knockout technology, shRNA gene knockdown technology, next-generation sequencing, model-based analysis of genome-scale CRISPR-Cas9 knockout (MAGeCK analysis), GO analysis, and other methods. Our findings provide a novel approach for large-scale screening of genes involved in pluripotency exit and offer an entry point for cell fate regulation research
    corecore